ci: performance gate — smoke tests + baseline comparison by tig · Pull Request #53 · gui-cs/Editor

tig · 2026-05-11T12:59:45Z

Summary

Two lightweight layers that catch performance regressions and celebrate improvements without slowing CI:

Layer 1: Performance smoke tests (xUnit)

4 Stopwatch-based tests in Terminal.Gui.Editor.Tests/PerformanceSmokeTests.cs that run in the normal test suite on every CI run. Thresholds are deliberately fat (50–250x typical) so they only fail on catastrophic regressions — not CI-runner noise.

Test	What it measures	Typical	Threshold
`BuildViewport_50Lines`	50-line viewport build (10K doc)	~200 µs	50 ms
`BuildSingleLongLine`	100× long-line build (200 chars)	~1.6 ms	10 ms
`DocumentLineLookup_100K`	5000 tree lookups (100K doc)	~33 µs	5 ms
`FullDocumentScroll_1K`	Full scroll simulation (1K lines)	~4 ms	200 ms

Layer 2: Benchmark baseline comparison (CI step)

A new CI step (Performance check, Ubuntu only) that:

Runs VisualLineBuildBenchmarks (ShortRun, ~30s)
Compares results to benchmarks/baseline.json
Posts a markdown comparison table to the GitHub step summary
Fails CI if any benchmark exceeds 3x baseline (egregious regression)
Celebrates 🎉 if any benchmark drops below 0.8x baseline (nice improvement)

Updating the baseline

After a deliberate performance change (optimization or known cost increase):

# Re-run and update baseline.json with new numbers
dotnet run --project benchmarks/Terminal.Gui.Editor.Benchmarks -c Release -- --filter "*VisualLineBuild*"
# Edit benchmarks/baseline.json with the new means, commit

Test plan

dotnet build Terminal.Gui.Text.slnx succeeds
All 57 editor tests pass (53 existing + 4 new smoke tests)
dotnet format --verify-no-changes clean
Smoke tests complete in <1s total

🤖 Generated with Claude Code

Two layers that catch regressions without slowing CI: 1. PerformanceSmokeTests (xUnit, runs in normal test suite): - Stopwatch-based with fat thresholds (50–250x headroom) - Catches catastrophic regressions only - 4 tests: viewport build, long-line build, 100K-line tree lookup, full 1K-line scroll 2. Benchmark baseline comparison (CI step, Ubuntu only): - Runs VisualLineBuild benchmarks (ShortRun, ~30s) - Compares to benchmarks/baseline.json - Fails CI if any benchmark > 3x baseline (regression) - Celebrates in step summary if any < 0.8x baseline (improvement) - Results posted to GitHub step summary as markdown table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI runners (shared, no turbo) are 2–4x slower than local M-series. The 10ms threshold was too tight — Ubuntu hit 23ms, macOS 38ms, Windows 20ms. Bump to 100ms to keep fat headroom. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split performance work into its own csproj and CI workflow so the correctness-focused CI stays fast across all three OSes and the perf gate stops being a silent no-op. New layout tests/Terminal.Gui.Editor.PerformanceTests/ PerformanceSmokeTests.cs (moved from Editor.Tests/) Terminal.Gui.Editor.PerformanceTests.csproj .github/workflows/perf.yml (ubuntu-latest only) - Release build - Run PerformanceTests (stopwatch smoke tests) - Run benchmarks/compare-baseline.sh (VisualLineBuild gate) - workflow_dispatch with `full-suite: true` runs the full BenchmarkDotNet matrix and uploads results as an artifact — the operator path for refreshing baseline.json (#78). .github/workflows/ci.yml - Perf step removed; comment points to perf.yml. Why a separate workflow - Windows / macOS GitHub-hosted runners share hosts with neighbour VMs; wall-time assertions there are too noisy to gate on. Linux runners are still noisy but consistent enough for a 3× threshold. - The full BDN suite takes minutes; CI for correctness needs to be fast. Per-PR perf only runs the focused VisualLineBuild filter. Fix while we're here: compare-baseline.sh used `--job ShortRun`, which BenchmarkDotNet rejects ("invalid base job"). BDN exited without running any benchmarks, the script saw no JSON report, warned "skipping comparison", and exited 0. So the perf gate has been a silent no-op since PR #53 — neither the >3× fail nor the <0.8× celebrate could ever fire (see issue #78, PR #77 didn't trigger the celebration for exactly this reason). Switched to `--job short` (the lowercase form BDN accepts) and added a comment documenting the history. Tests on this branch (local Release): Text.Tests: 230 passing Editor.Tests: 87 passing (was 91; 4 perf tests moved out) IntegrationTests: 108 passing PerformanceTests: 4 passing (new project) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tig and others added 2 commits May 11, 2026 06:59

tig merged commit 27d0b7e into develop May 11, 2026
6 checks passed

tig deleted the ci/perf-gate branch May 11, 2026 13:33

tig mentioned this pull request May 11, 2026

Rendering: add visual-line cache and invalidation (B1 follow-up) #49

Closed

Copilot AI mentioned this pull request May 11, 2026

Editor rendering: add per-line visual cache with document-change invalidation #54

Closed

tig mentioned this pull request May 12, 2026

perf: dedicated test project + workflow (ubuntu-only) #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: performance gate — smoke tests + baseline comparison#53

ci: performance gate — smoke tests + baseline comparison#53
tig merged 2 commits into
developfrom
ci/perf-gate

tig commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tig commented May 11, 2026

Summary

Layer 1: Performance smoke tests (xUnit)

Layer 2: Benchmark baseline comparison (CI step)

Updating the baseline

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant